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INTRODUCTION TO SPEECH SYNTHESIS 


Speech synthesis can be used to add additional lines of dialogue to NPCs that use base game 
voice types. 


In Wyrmstooth, I use speech synthesis to give background NPCs, such as the hunters at 
Hunter’s Shack or the marauder leader at Cragwater Camp, unique things to say. 


In this section ll be showing you how to train a custom Tacotron and WaveGlow model on the 
Google Colab platform using a dataset based on a voice type from The Elder Scrolls V: Skyrim. 


Tacotron is a generative text-to-speech synthesis program. For this tutorial I’ll be using the 


NVIDIA Tacotron 2 repository available on GitHub: https://github.com/NVIDIA/tacotron2. 
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Figure 1 - NVIDIA Tacotron 2 on GitHub. 


WaveGlow is a flow-based generative speech-synthesis program. A custom WaveGlow model 
will significantly improve the quality of synthesized speech. For this tutorial Pll be using the 


NVIDIA WaveGlow repository available on GitHub: https://github.com/NVIDIA/waveglow. 
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Introduction to speech synthesis 


Google Colab is a cloud platform that Pll be using in this tutorial to run Tacotron and 
WaveGlow. It can be used to run python code and other commands in a pre-built python 
environment via your browser. We can link our Google Drive storage to a Colab session so we 
can access our dataset and save checkpoints back to it as we train. 
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Figure 2 - A Colab Notebook. 


I’ve already set up a Colab Notebook with all the commands you'll need to run. You can access 
it here: 


https://colab.research.google.com/drive/13vRLNPLqaVWGjeHGuUilKxxXuZw93PUBBr-eusp= 





sharing. 


The main advantage of using Google Colab is that we don’t need to set up our own python 
environment locally. 
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Introduction to speech synthesis 


Here is some sample output from a few different models I’ve trained: 


tacotron sample femalenord1.wav 
tacotron sample femalenord2.wav 
tacotron sample femalenord3.wav 
tacotron sample joeroganl.wav 

tacotron sample joerogan2.wav 

tacotron sample joerogan3.wav 

tacotron sample femalecommanderl.wav 
tacotron sample femalecommander2.wav 
tacotron sample femalecommander3.wav 
tacotron sample malecommanderl.wav 
tacotron sample malecommander2.wav 
tacotron sample malecommander3.wav 
tacotron sample maleslycynicall.wav 
tacotron sample maleslycynical2.wav 
tacotron sample maleslycynical3.wav 


As you can hear, Tacotron is able to pick up on various vocal nuances like the rolling r’s of the 
femalenord voice actor’s accent. 
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PREPARING A DATASET USING VOICE 
ACTING FROM THE ELDER SCROLLS V: 
SKY RIM 


In this section ll be showing you how to prepare a new dataset from which we'll be training a 
new Tacotron and WaveGlow model. 


A dataset consists of voice clips from a single speaker and their corresponding subtitles. 
Generally speaking, the more voice acting we have the better our results should be. Models 
based on datasets consisting of less than 3 hours of audio are going to struggle with articulation. 


The first thing we need to do is extract the subtitle files from The Elder Scrolls V: Skyrim. 
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Figure 3 - Creation Kit. 
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Once the Creation Kit loads, go to File > Data. Double-click on ‘Skyrim.esm’ then click OK and 
wait for it to load. 


Data a 


Plugin.Master Files Created B ee _ Parent Masters 


Bethesda Game Studios = Game Studias Filename 
Fe kIT. &S0ri Master File Summary 

LO Update. esm Master File 

LJ Cawnguard.esmi Master File 

LI HearthFires.esmi Master File 
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LI Unofficial Skyrin Lege... Master File 

CO Wwiurmstooth.esp Master File 

CO TiminglsEverthing.esp = Plugin File 


LO SkvUlesp Plugin File 

7 = Created Orn 11 2020 2246 PM 
LO Open Cities Skurim.esp — File: 
Fee ae i rs “Last Modifted 1177/2020 2:28 PM 


Set as Active File | Details... | | | OK Cancel | 


Figure 4 - Loading Skyrim.esm. 





Click ‘Yes to All to any warnings that pop up. 
Go to Character > Export Dialogue and click OK to the Export Dialogue popup window. 


Export Dialogue 


Dialogue export will be saved to; 
dialoqueExpoart.tet 
Files included in export: 


SkvrIM.e sm 


Cancel | 


Figure 5 - Exporting dialogue. 





This will create a dialogueExport.txt file in your Skyrim installation folder. 


Again, click “Yes to AIP to any warnings that pop up. Close the Creation Kit once the export has 
finished. 


For this next part we'll need Notepad++ and Microsoft Excel, or a similar spreadsheet 
application. You can download Notepad++ for free from its website: https://notepad-plus- 


plus.org/downloads/. 
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Open dialogueExport.txt in Notepad++. It should look like this: 


[at Ci\Program Files (xd6) Steam \stearmapps common skyrin ydialoguebxporttet - Notepad++ 


File Edit Search View Encoding Language Settings Tools Macro Run Plugins Window ? 


3 3 is | h| \*®@ ee) 2? G(BSlS21ISaBAas|m 


=| dialogueE xport.txt £9 | 


LINE ID SPEAKER NPCID GetIsIn RAceE VOICE TYPE QUEST BRANCH CATEGORY Tee. SUBTYPE TOPIC 
FFFFFFFF CrWerewolf¥oice CreatureDialoqgueWerewolf£ (none) Combat Combat Death 
FFFFFFFF CrWerewolf¥oice CreatureDialoqueWerewolf [none Combat Combat Hit Fo 
FFFFFFFF CrFoxVoice CreatureDialogqueFox [none] Combat Combat Death FormiID oOO1 
FFFFFFFF CrFoxVoice CreatureDialogqueFox [none] Combat Combat Hit FormiID OOLOF66 
FFFFFFFF CrChickenVYoice CreatureDialogquechicken [nonel) Combat Combat Death Fo 
FFFFFFFF CrChickenVYoice CreatureDialogquechicken (none) Combat Combat Hit FormIit 
FFFFFFFF CrChickenVYoice CreatureDialoquechicken [none] Miscellaneous Miscellaneous 
FFFFFFFF FemaleUniqueDelphine MasSkyHavenSparring  [none) Miscellaneous Miscellane 
FFFFFFFF MaleocldKindly DialogqueWinterholdtollegerPostsceneguest [none] Scene Scene 
FFFFFFFF FemalesSultry DialogqueWinterholdtollegerostsceneguest [none] Scene Scene 
FFFFFFFF MaleEvenToned DisaloqgqueWinterholdtollegerostsceneguest [none] Scene Scene 
FFFFFFFF MNaleCondescending DialoqueWinterholdtollegerostsceneguest [none] SCene ac 
FFFFFFFF MaleYoungEager DialoqueWinterholdtollegerPostsaceneguest [none] Scene Scene 
FFFFFFFF Female YoungEager DialogqueWinterholdtollegePostsceneQuest (none) Scene ac 
FFFFFFFF HaleEvenTonedéccented WERoadi1 WERoad1l1lBrancho2 Topic PlayerDialoqgue 
FFFFFFFF HaleEvenTonedéccented WERoad1i1 WERoad11lBrancho4 Topic PlayerDialoque 
FFFFFFFF HMaleEvenTonedéccented WERaad11 WERoad1l1lBranchol Topic PlayerDialoque 
FFFFFFFF MaleEvenTonedéccented WERoad11 WERoad11lBranchol Topic PlayerDialoque 
FFFFFFFF MaleEvenTonedéccented WERoadi1 WERoad1LlBranchol Topic PlayerDialoque 
FFFFFFFF MaleEvenTonedéccented WERoadi1 WERoad11lBranchos Topic PlayerDialoque 
FFFFFFFF MaleEvenTonedéccented WERoadii WERoad11lBranchos Topic PlayerDialoque 
FFFFFFFF MNaleEvenTonedéccented WERoadil [none } Miscellaneous Miscellaneous He 
FFFFFFFF MaleEvenTonedéccented WERoadil [none } Miscellaneous Miscellaneous He 
FFFFFFFF MaleEvenTonedéccented WERoadil [none } Miscellaneous Miscellaneous 
FFFFFFFF MaleEvenTonedéccented WERoadil [none } Miscellaneous Miscellaneous 
FFFFFFFF Maledérqonian WERoadog WERoadoosotter Topic PlayerDialoque Custom 


[= 


in ob wo op 


1m 


oo 


Po bo OE 
5 in wa 


Tr Bo a ss 
Mm 


-] 





Normal text file length: 20,770,570 lines : 59,608 Ln: 1 Col: 1 Pos: 1 Unix (LF) 
Figure 6 - dialogueExport. txt. 


We'll need to remove unnecessary columns first. Go to Search > Replace. In the ‘Find what’ 


field enter a comma ‘,’ and in the ‘Replace with’ field enter an at symbol ‘@’ then click Replace 
All. 
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Figure 7 - Replacing commas with @ symbols. 


Don’t wotty, we'll be reversing that change once we’re done in Excel. 
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Keep Replace open. This time, in the ‘Find what’ field enter a tab character ‘ ‘, and in the 
‘Replace with’ field enter a comma ‘,’ then click Replace All. 


Replace | x | 


Find Replace Findin Files Mark 


Eind what :| MMMM] a | Find Net 
Replace with : Replace 
[In selection Replace All 


[_] Backward direction Replace Allin All Qpened 
Documents 
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Figure 8 - Replacing tabs with commas. 
Our dialogueExport.txt file should now look like this: 


[at *CryPragram Files 86) \Stearm\steamappsycammonSskyrinidialoqueb: porttet - Notepad++ 


File Edit Search View Encoding Language settings Tools Macro Run Plugins Window ? 


2Saea BARA \aec|m@e|2 (BRIT IFSRRAa S/o 


fe=| dialogueE »port.tet ES | 


LINE ID, SPEAEER,NPCID, GerIsID, RACE, VOICE TYPE, QUEST, BRANCH, CATEGORY, TYPE, SUBTYPE, TOPIC, TOPICINFO, RESPONSE « 
FFFFFFFF,--,--,--,--, CoWerewolfVoice, CreatureDialoguelerewolf, (none), Combat, Combat, Death, , FormID OOO00E44, 
FFFFFFFF,--,--,--,-—, GrWerewoltVoice, CreatureDbialoqueterewolf, (none), Combat, Combat, Hit,,FormID OQOOOO0E45,1, 
FFFFFFFF,--,--,--,-—,CrFoxVoice, CreatureDialoqueFox, (none), Combat, Combat, Death, ,FormiID OO1LOF670,1, Creature 
FFFFFFFF,--,--,--,—-—, CrFoxVoice, CreatureDialoqueFox, (none), Combat, Combat, Hit, ,Formil OO1LOF66F,1,CreatureDi 
FFFFFFFF,--,--,--,-—-,cCrchickenVoice, CreatureDialoqueCchickeén, (none), Combat, Combat, Death,,FormiID OO10#a5C,1, 
FFFFFFFF,--,--,--,--,cCrchickenVoice, CreatureDialoquechicken, (none), Combat, Combat, Hit,,FormiIl OOL10a8a55,1,Cr 
FFFFFFFF,--,--,--,--,CrchickenVoice, CreatureDialoquechickeén, (none), Miscellaneous, Miscellaneous, Idle, ,Forml 
FFFFFFFF,--,--,--,—-—, FemaleUniquebdelphine, WoSkyHavenSparring, (none) ,Miscellaneous, Miscellaneous, Sharedinfo 
FFFFFFFF,—--,—--,-—-,—-—, HaleGldEindly, DialoqueWinterholdCollegeProstsceneguest, (none), Scene, Scenebialogue, cust 
FFFFFFFF,--,--,--,—-—-, Femalesultry, bialoqueWinterholdCollegePostSsceneguest, (nonel , Scene, Scenebialoque, Custa 
FFFFFFFF,—--,--,-—-,—-—, HaleEvenToned, DialoqueWinterholdtCollegerPostsceneguest, (none), Scene, Scenebialogue, cust 
FFFFFFFF,—--,--,-—-,—-—, Halecondescending, DialoqueWinterholdtollegerostsceneQuest, (none) ,Scene, ScenedDialogue, 
FFFFFFFF,—--,--,-—-,—-—, HaleYfoungkager, DialoquelinterholdtollegerostScenéeouest, (none) , Scene, Scenebialoque,cus 
FFFFFFFF,--,—--,--,—-—, Female YoungEager, DialoqueWinterholdtCollegePostsceneguest, (none) , Scene, sceneDbialoque,c 
FFFFFFFF,--,—--,—-—-,—-—, MaleEvenTonedaccented, WERoadl1l, WERoadliBranchdd, Topic, PlayerDialogue, Custom, WERoad11iB 
FFFFFFFF,--,--,--,——, MaleEvenTonedaccented, WERoadl1, WERoadliBrancho4, Topic, PlayerDialogue, Custom, WERoad1iB 
FFFFFFFF,--,—--,--,—-—, MaleEvenTonedaccented, WERoadl1, WERoadiilBranchoOl, Topic, PlayerDialogue, Custom, WERoad11B 
FFFFFFFF,--,--,--,—-—, MaléeEvenTonedaccented, WERoadl1, WERoadiilBranchdOl, Topic, PlayerDialogue, Custom, WERoad11itT 
FFFFFFFF,--,--,--,—-—, MaléeEvenTonedaccented, WERoadl1, WERoadliBranchoOl, Topic, PlayerDialogue, Custom, WERoad11iT 
FFFFFFFF,--,--,--,—-—, MaleEvenTonedaccented, WERoadl1, WERoadiiBranchdOs, Topic, PlayerDialogue, Custom, WERoad1iiB 
FFFFFFFF,--,--,-—-,——, MaleEvenTonedaccented, WERoadi1, WERoadiiBranchdos, Topic, PlayerDialogue, Custom, WERoadiisB 
FFFFFFFF,—--,--,-—-,—-—, HaleEvenTonedaéccented, WERoadi1i, (none! , 4iscellaneocus, Niscellaneocus, Hello, WERoad11iHello 
FFFFFFFF,--,--,-—-,—-—, HaleEvenTonedaéccented, WERoadii, (nonel , 4iscellaneous, Niscellaneous, Hello, WERoad1l1lHello 
FFFFFFFF,—--,--,-—-,—-—, HaleEvenTonedaéccented, WERoadi1i, (nonel , 4iscellaneocus, Niscellaneous, Sharedinto, WERoadil 
FFFFFFFF,—--,--,-—-,—-—, HaleEvenTonedaéccented, WERoadii, (none! , 4iscellaneous, Niscellanéeous, Sharedinto, WERoadil 


FFFFFFFF,--,—--,--.—-—-, NMaledrqonian, WERoado9, VERoadosottfer, Topic, PlayerDialoque, Custom, VERoadO9o0fferTopic,Fo ™ 
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Normal text file length: 20,770,570 lines : 59,608 Ln:2# Col: 9 Pos: 245 Unix (LF) 





Figure 9 - dialogueExport.txt modified. 


Close Notepad++. Rename dialogueExport.txt to dialogueExport.csv and open it in Excel. 
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dialaqueExportcsy - Excel 
INSERT PAGE LAYOUT FORMULAS DATA REVIEW VIEW 


25, . i itl 1 soi r 
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fe) UNEID 


B ® D E F G H | J K. L MA hy 
1 [LINE 1D [SPEAKER NPCID GetlsiO RACE VOICE TYFQUEST BRANCH CATEGOR* TYPE SUBTYPE TOPIC TOPICINF(RESPONMS 
2 |FFFFFFFF CrvferewrCreaturel (none) Combat Combat Death FormlD oo 
a |FFFFFFFF Crwerewr Creaturel (none 
4 | FFFFFFFF CrFox'oic Creaturel (none 
3 | FFFFFFFF CrFox¥'oic Creaturec (none 
6 
? 
8 
9 


( 

f i} Combat Combat Hit FormlD 00 

f } Combat Combat Death FormlD 00 

( i} Combat Combat Hit FormiD 00 

FFFFFFFF CrchickenCreatureC(none) Combat Combat Death FormlD 00 
FFFFFFFF CrchickenCreatureC(none) Combat Combat Hit FormlD 00 
FFFFFFFF CrChickenCreatureC(none)  MiscellarniMiscellaniidle FormlO 00 
FFFFFFFF FemaleUrMaskyHa' (none)  MiscellarnMiscellani Sharedinto FormlO 00 

10 | FFFFFFFF MaleOldk Dialague¥ (none) 

11 |FFFFFFFF FemaleSu Dialague¥ (none) 

12 |FFFFFFFF MaleEven Dialague¥ (none) 

13 | FFFFFFFF MaleConc Dialague¥ (none) 

14 | FFFFFFFF MalevYoun Dialague¥ (none) 

15 | FFFFFFFF Femaleyo Dialoguey(none} Scene SceneDial Custom FormlD 00 

16 |FFFFFFFF MaleEven WERoad1! WERoad1! Topic PlayerDialCustom  ‘YWERoadl!FormilD 00 

17 |FFFFFFFF MaleEven WERoad1! WERoad1! Topic PlayerDialCustom  ‘YWERoadl!FormlD 00 


Scene SceneDial Custam FormlD O00 
Scene SceneDial Custam FormlD 00 
Scene SceneDial Custam FormlD O00 
Scene SceneDial Custam FormilD O00 
Scene SceneDial Custam FormilD O00 


READY FA =] -—}——__ + _ i0% 





Figure 10 - dtalogueExport.csv. 
Delete all columns except ‘FULL PATH’ and “RESPONSE TEXT”. 


dialaqueExportcsy - Excel 
INSERT PAGE LAYOUT FORMULAS DATA REVIEW VIEW 


Re - General = Fra) Conditional Formatting * © Insert - 3 - ar- 
| $7 % 3 EE¥ Format as Table ~ ce Delete + e] dH: ~ 


—_— — 11 
=i “ih sa [4 Cell Styles ~ fe Formate 


Clipboard Ts Ua Alignment G  WMumber Te Styles Cells Editing 
Al fe FULLPATH 


ry 
1 [FULL PATH [RESPONSE TEXT 
2 |Data\Sound\Voice\Skyrim.esmCcrverewolfvoice\CreatureDialogueWerewolf O0000E44 1.nwm 
3 |Data\Sound\Voice\Skyrim.esmcrverewolfvoice’\CreatureDialogueWerewolf OO000E45 1l.ewm 
4 Data\Sound\oiceyskyrim.esm \CrFoxVoice\CreatureDialogueFox OO10F670_1.xwmi 
5 Data\Sound\voice\skyrim.esm \CrFoxvoice\CreatureDialogueFox OO10F66F_1.xwrni 
6 |Data\Sound\oice\Skyrim.esmCrChickenVoice’\CreatureDialogueChicken  O010445C L.awm 
7 |Data\Sound\oice\Skyrim.esmCrChickenVoice’\CreatureDialogueChicken  O010445B 1l.awm 
& |Data\Sound\voice\Skyrim.esmCrChicken\Voice’\CreatureDialogueChicken  OO10445D 1.ewm 
9 |Data\Sound\oice\Skyrim.esm\FemaleUniqueDelphine\MOskyHavenSparring 001092C3 1l.x«wmi 
10 Data\Sound\oice\skyrim.esm\Maledidkindly\DialogueWi OO107E3C_l.s«wm Members of the Colles 
11 Data\Sound\oice\skyrim.esm\Femalesultry\Dialoguevvi OO107E3B_1.xwmi Long live the Arch-Mas 
12 Data\Sound\oice\skyrim.esm\MaleEvenToned\DialogueWi OO107E34&_l.ewm Congratulations! 
13 Data Sound\oice\skyrim.esm\MaleCondescending\DialoguewWi  OO107E39 l.zwm Well done@ well done 
14 Data Sound\oice\skyrim.esm\MalevoungEager,\DialogueWi_  O0107E38 1..wmi Glory to the Arch-Mag 
15 Data Sound\oice\skyrim.esm\Femalevoungeager\DialoguewWi  OO107E37 l.ewm Hooray forthe 4rch-M 
16 Data\Sound\oice\skyrim.esm\MaleEvenTonedAccented\WeERoad11 WERoad1l1Brancho?2T 00106444 l.sewm | have everything unde 
17 Data\Sound\oice\skyrim.esm\MaleEvenTonedAccented\WeERoad11 WERoad1liBrancho4aT 00106445 1l.awm Thanks. You take care t— 


dialoqueExport C+) 4 ' 
READY “—]  —-—_———__ + _ 100% 





Figure 11 - Removed unnecessary columns. 


Remove the column headers. Right click on ‘1’ on the left hand side to select row 1, then select 
Delete. 
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Next we need to clear empty lines. Click on ‘B’ to select the entirety of column B, then click on 
the Sort and Filter button and select ‘Sort A to 2’. 


When the Sort Warning pops up, select “Expand the selection’ then click Sort. 


A 


Sort Warririg 


Microsoft Excel found data next to your selection, Since you have not 
selected this data, it will mot be sorted, 


What do vou wantto da? 





Figure 12 - Expand the Selection. 


Delete the rows where cell B is either blank or has non-dialogue, such as (Deep breath), *cough* 


ot <Laughter>. 


Now we need to sort the rows by file name. Click on ‘A’ to select the entirety of column A, then 


click on the Sort and Filter button and select ‘Sort A to 2’. 
Again, when the Sort Warning pops up, select ‘Expand the selection’ then click Sort. 
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Data\sound\oice\skyrim., 
Data\Sound\oice\Skyrim., 
Data\Sound\oice\Skyrim., 
Data\Sound\oice\Skyrim., 
10 Data\Sound\oice\Skyrim. 
11 Data\Sound\oice\Skyrim. 
12 |Data\Sound\oice\skyrim., 
13 |Data\sound/oice\skyrim, 


1 
: 
a 
4 
3 
6 
? 
o 
g 


esm\CrDogvoice\DA03 OO01CDAE 1.ewm 
esm\CrDogvoice\DA03 O000209BB l.x«wm 
esm\CrDogVvoice\DA03 OOOEFTSE_l.«wm 
esm\CrDogVvoice\DA03 OOOEFTSF_1.nwm 
esm\CrDogVvoice\DA0S OOOEFT40_1.xwm 
esm\crDogvoice\DA03 O00EF?41 lxewm 
esm\CrDogvoice\DA0S OO0EF742? Lawn 
esm\CrDogvoice\DA03 OOOET?43 Lxawm 
esm\CrDogVvoice\DA03 OOOEFT44 1.xwm 
esm\CrDogVvoice\DA03 OOOEFT46 1.xwm 


[knew | could trust you! 

Vile! None of that - you made a deal@ and the mortal stood by it hone: 
We should go get the axe. 

Let's get the axe and be done with this. 

Mow that guy was nuts, 

Don't bothers Clavicus won't talk to me, 

Wait a second@ there's another option here. 

| have afeeling!'m not goingto like what happens when we get back tr 
All this just ta kill me Clavicus? 

| remember that axe, 


esm\CrDogvoice\DA03 DAISBarabasAboutAéx One of Clavicus's little jasts. A wizard named Sebastian Lort had a daug 
esm\CrDogVvoice\DA03 DAI3BarabasAboutAx When the daughter became a werewolf it drove Sebastian over the ed: 


14 | Data\sound\oiceyskyrim, 
15 Data\Sound\oice\Skyrim. 
16 Data\Soundoice\Skyrim., 
17 Data\Sound\oice\Skyrim. 


esm\CrDogVvoice\DA03 DA03BarabasAboutAx The wizard wished forthe ability to end his daughter's curse, 
esm\CrDogVvoice\DA03 DAOSBarabasSboutAx Clavicus gave him an axe. 

esm\CrDogVvoice\DA03 DAOSBarbasConvincel The axe isn't the only item dear old Clavicus has. 
esm\CrDogVvoice\DA03 DAOSBarbasConvincel Give him the Rueful 4xe and once we're reunited the Masque of Clayic 


EH oo ee eee 


dialogueExport 


10086 
Figure 13 - dialogueExport.csy properly sorted. 
Save the document then close Excel. Open dialogueExport.csv again in Notepad++. 


In Notepad++, go to Search > Replace. In the ‘Find what’ field enter an at symbol ‘@’, and in 
the ‘Replace with’ field enter a comma ‘,’ then click Replace All. 
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Preparing a dataset using voice acting from The Elder Scrolls V: Skyrim 





Next, we need to change the file extensions. In the ‘Find what’ field enter “xwm,’ and in the 
‘Replace with’ field enter ‘;wav |’ then click Replace All. 


Replace | x | 
Find Replace Findin Files Mark 


[In selection Replace All 
[_] Backward direction Replace Allin All Qpened 
Documents 
[|] Match whole word only 
[| Match case Close 
Wrap around 
Search Mode Transparency 
(®) Normal (®) On losing Focus 
(Extended (in, ir, it, WO, ied () Always 
() Regular expression _fnakches newline el 


Figure 14 - Replacing the file extensions and separator character. 


We need to make everything lower case. To do this, go to Edit > Select All then go to Edit > 
Convert Case To > lowercase. 
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Preparing a dataset using voice acting from The Elder Scrolls V: Skyrim 


The file paths need to be modified for Google Drive. In the ‘Find what’ field enter 


‘data\sound\voice\skyrim.esm\’ and in the ‘Replace with’ field enter 
‘/content/drive/MyDrive/’ then click Replace All. 


Replace |x | 


Find Replace Findin Files Mark 





Find what : | Mitty Retell] gta Uae (ead ey dd =e ry * Find Mext 
Replace with : | fcontent/drive/MyDrivel ee Replace 
In selection Replace All 
[_] Backward direction Replace Allin All Qpened 
Documents 
[| Match whole word only 
[| Match case Close 
Wrap around 
Search Mode Transparency 
(®) Normal (®) On losing Focus 
(Extended (in, ir, it, WO, ie. () Always 
() Regular expression _fnatches newline l 


Replace All: 5/873 occurrences were replaced in entire file 
Figure 15 - Changing the file paths. 


Lastly, in the ‘Find what’ field enter ‘\’ and in the ‘Replace with’ field enter ‘/wavs/’ then click 
Replace All. 


Replace x | 


Find Replace Findin Files Mark 


indwhat (MY a | ind Next 


Replace with: | fways/ te Replace 
[In selection Replace All 
[_] Backward direction Replace Allin All Gpened 
Documents 
[| Match whole word only 
[| Match case Close 
Wrap around 
Search Mode Transparency 
(®) Normal (®) On losing Focus 
(Extended (in, ir, it, WO, i. () Always 
() Regular expression _fnakches newline a 


Replace All: 57873 occurrences were replaced in entire file 


Figure 16 - Adding wavs folder to file path. 
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Preparing a dataset using voice acting from The Elder Scrolls V: Skyrim 


We now have a list of subtitles grouped by order of voice type from which we can easily make 
new training and validation files. 


[aa Ci\Program Files (x86) Steam \stearapps\common\skyrim \dialogueExportcsy - Notepad++ 


File Edit Search View Encoding Language Settings Tools Macro Run Plugins Window ? 


a4 a Ts | Jacl m@e| 2 &|(BRIDISES RBA S|m 


| dialoqueExport.cey £9 | 


fcontent/drive/MyDrive/crdogvoice/wave/dal3 dalSbarabasaboutaneto O00e7693 2.wav| when the daughter became «a 
fcontent/drive/MyDrive/crdogvoice/wave/dal3 dal3barabasaboutaneto O00e7693 3.wavy|the wizard wished for the 
fcontent/drive/MyDrive/crdogqvoice/wave/dal3 dal3barabasaboutaneto O00e7693 4.wav|clavicus gave him an axe. 
fcontent/drive/MyDrive/crdogqvoice/wave/dal3 dal3barbascanvincepla O00e76a3 l.wav|the axe isn't the only it 
fcontent/drive/MyDrive/crdogqvoice/wava/dals dal3barbasconvincepla O00e76a3 2.wav|give him the rueful axe a 
fcontent/drive/MyDrive/crdogvoice/wave/dal3 dal3barbastollowtoggl OO0ic4df 1.wav|lead the way. 
fcontent/drive/MyDrive/crdogvoice/wave/dal3 dalSbarbasfollowtoggl O001c4e0 1l.wav|all right, then. we'll ge 
fcontent/drive/MyDrive/crdogqvoice/wava/dal3 dal3barbastorcegreett O007%eaeO l.wav| you are exactly what i wa 
fcontent/drive/MyDrive/crdogqvoice/wave/dal3 dal3barbasgreet O001c4d9 1l.wav| yes, i think you'll be just wha 
fcontent/drive/MyDrive/crdogvoice/wave/dal3 dal3barbasgreetcontin O00e0d4s 1l.wav| you see, my name is barba 
fcontent/drive/MyDrive/crdogqvoice/wave/dal3 dalSbarbasgreetingO O001bfcO 2.wav|skyrim is now host to giant 
fcontent/drive/MyDrive/crdogvoice/wave/dal3 dalSbarbasgreetingla OO0ibtfce 3.wav|i know, i know... wars to 
fcontent/drive/MyDrive/crdogvoice/wave/dal3 dal3barbasmoreinfoO O00icdaa_l.wav|well... i guess you could s 
fcontent/drive/MyDrive/crdogvoice/wave/dal3 dal3barbasmoreinfo0 O001ledaa 2.wav|he couldn't just kill me, ¥ 
fcontent/drive/MyDrive/crdogqvoice/wave/dal3 dal3barbasmoreinfoO O00icdaa 3.wav|of course, because of our s 
fcontent/drive/MyDrive/crdogqvoice/wave/dal3 dal3barbasoftferO O00ibtc3 l.wav|my master and i had a bit of a 
fcontent/drive/MyDrive/crdogvoice/wava/dal3 dal3barbasofferO O00ibtc3 2.wav|he's kicked me out until i fin 
fcontent/drive/MyDrive/crdogvoice/wava/dals dal3barbasofferla O0001c4de 1l.wav|very funny. my master is clay 
fcontent/drive/MyDrive/crdogvoice/wave/dal3 dal3barbastovile O001ic4db 1l.wav|thank you. now, since he banis 
fcontent/drive/MyDrive/crdogqvoice/wava/dal3 dalSbarbastavile O001e4db 2.wav|i know there's a cult that wor 
fcontent/drive/Mybrive/crdogvoice/wava/dal3 dalSbarbastovile O001ic4db 3.wav|if this works out, i'11 make = 
fcontent/drive/MyDrive/crdogvoice/wava/dal3 dalSbarbasvalkaway OO0896a7 1.wav| listen, when you're ready ta 
fcontent/drive/MyDrive/crdogvoice/wave/dal3 da03hello O001c4d5 1l.wav| well, well. hello there, big man. 
fcontent/drive/MyDrive/crdogqvoice/wave/dal3s dalShello O001c4d6 1.wav| hello there, miss. 
fcontent/drive/MyDrive/crdogvoice/wave/dal3 da0Shello 0001c4da7 1l.wav|barbas, at your service. 
fcontent/drive/MyDrive/crdogvoice/wava/dal3 da03hello O001edbS 1.wav| woot. 
foontent/drive/MyDrive/crdoqvoice/wavs/dao3 datShello OO001edb6 1.wav| woot. 
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Normal text file length: (.G29.515 lines: o4 ora Ln: 11 Col: 79 Pos:1,198 Windows (CRLF)  UTF-8 





Figure 17 - dialogueExport.csy formatted for Google Colab. 


I’ve uploaded a couple pre-sanitized dialogueExport files from different Bethesda Game Studio 
titles to make it easier for you to set up new training and validation files. 


Skyrim: 
https: / /drive.zoogle.com/ file/d/1IqghCZWOoOsWwK6VDIvakPnUmyMS66As1 /viewPusp=sh 
aring 


Oblivion: 
https:/ /drive.zoogle.com/ file/d/1luxbNONwZcW feqSYzRoFBLOt4ixX ZNv5cl/viewPusp=shari 
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Preparing a dataset using voice acting from The Elder Scrolls V: Skyrim 


Next we need to create a training text file. Select all the voice lines for the specific voice type you 
plan on training. 


[at Ci\Program Files (¢d6)\ Steam \stearapps\common\skyrinm \dialogueExport.csy - Notepad++ 


File Edit Search View Encoding Language Settings Tools Macro Run Plugins Window ? 


a4 ais | & fs | l\m@el@@\BRBlea 128 S8Aa®| 


| dialogueExport.cey £9 | 


fcontent/drive/MyDrive/temalenord/ wave/wisharedin wisharedinfosto O006febb il.wav|careful with that fire. a 
fcontent/drive/MyDrive/temalenord/ wave/wisharedin wisharedinfosto OOO70df0 1l.wav|here's to friendship. a 
fcontent/drive/MyDrive/temalenord/ wave/wisharedin wisharedinfosto OO07309e 1l.wav|thank you. honor dictat 
fcontent/drive/MyDrive/temalenord/ wave/wisharedin wisharedinfosto OO0730c3 l.wavy|now, sa we're clear - t 
fcontent/drive/Mybrive/temalenord/ wava/wisharedin wisharedinfosto OOO730e8 1l.wav|that is sa. 
fcontent/drive/MyDrive/temalenord/ wave/wisharedin wisharedinfosto O00ac60a_1l.wav|gonna get cold tonight. 
fcontent/drive/MyDrive/temalenord/ waves wisharedin wisharedinfosto O00ab3db i.wav|gods help us, this can! 
fcontent/drive/MyDrive/temalenord/ wave/wisharedin wisharedinfosto O00ade?l lewav|get out of here.... i'm 
fcontent/drive/MyDrive/temalenord/ wava/wisharedin wisharedinfosto O00bS0ad_ 1.wav|gods' blessings on you. 
fcontent/drive/Mybrive/temalenord/ wave/wisharedin wisharedinfosto O00deesf 1l.wavy|how are we doing over h 
fcontent/drive/MyDrive/femalenord/ wave/wisharedin wisharedintosto O00dee90 1l.wavy| you want another one of 
fcontent/drive/MyDrive/femalenord/ wave/wisharedin wisharedinfosto O00dee91 1l.wav|there's plenty more whe 
fcontent/drive/MyDrive/tfemalenord/wave/witavern_ O00552bb 1i.wavy|saadia, wake up dear! 
fcontent/drive/MyDrive/femalenord/wave/witavern O005536c 1.wav| yes, mum! 
fcontent/drive/MyDrive/tfemalenord/wave/witavern O00c7916 1l.wav|come on in. just stoked the fire. take a 
fcontent/drive/MyDrive/temalenord/wave/witavern  OO00c?hOd 1l.wavy| welcome. let me know if you want anythin 
fcontent/drive/MyDrive/temalenord/wava/witavern  O00c?bOe 1l.wavy|come on in. we got warm food, warm drink 
fcontent/drive/MyDrive/temalenord/wava/witavern  O00dbat9 1l.wav|frabbi, a customer needs a drink! 
fcontent/drive/MyDrive/temalenord/wava/witavern  O00deeds 1l.wav|look alive, will you! 
fcontent/drive/MyDrive/temalenord/wava/witavern O00dees9 1l.wav|come on in. let me know if you need anyt 
fcontent/drive/Mybrive/temalenord/ wava/witavern  O00deeSa_l.wav|no problem. 
fcontent/drive/Mybrive/tfemalenord/wava/witavern  O00deeSb 1.wav| yessir! 
fcontent/drive/Mybrive/tfemalenord/ waves witavern_witavernserverpla OO00dees6 1l.wav| you want a drink? 
fcontent/drive/MyDrive/femaleoldgrumpy/ wavse/bardscolle bardscollegelut O00d093dd_ 1.wav| what? you did? oh 
fcontent/drive/MyDrive/temaleoldgrumpy/ wave/bardscolle bardscollegelut O00d93dd 2.wav|there is no way i 
fcontent/drive/MyDrive/temaleoldgrumpy/ wave/bardscolle bardscollegelut O00d93de 1.wav|it makes me sad wh 


fcontent/drive/MyDrive/femaleoldqrumpy/ wavs/bardscolle bardscolleqelut OO0d93e1 1i.wav|finn was the bard * 
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Normal text file length: .e29.515 lines: S7o%4 Ln: 14,475 Col: 99 Sel: 302,297 | 2,228 Windows (2RLF) UTF-8 





Figure 18 - Select all lines of dialogue for a specific voice type. 


Create a new text file and name it ‘#_training.txt’ where ‘#’ is the name of the voice type you’re 
training, i.e.: femalenord. In my example that’s going to be ‘femalenord_training.txt’ because [ll 
be making a dataset based on the femalenord voice type. 


Copy the lines you highlighted from dialogueExport.csv and paste them into your #_training.txt 
file. 


Tacotron 2 Speech Synthesis Tutorial 15 


Preparing a dataset using voice acting from The Elder Scrolls V: Skyrim 


Now highlight at least 50 lines from your training text file. 


[at D\Projects\Pythontfilelists\fernalenord_training.tet - Notepad++ 


File Edit Search View Encoding Language settings Tools Macro Run Plugins Window ? 


Pre, 2 Glé DR|eec| me 2S (BRIS 1 Fah ha S| eo 


el dialogueE xport.cev | femalenord_training. tet £9 | 


fcontent/drive/MyDrive/tfemalenord/ wave/cwOZa_mql03asoldierblockin O0059eec 1.wav|nobody allowed in. ulfric «a 
fcontent/drive/MyDrive/femalenord/wave/cewOzb OO06f970 1l.wav|damn dustmen! why don't you stay dead? 
fcontent/drive/MyDrive/femalenord/wave/ewOzb O006fb15 1l.wav| look out! draugr! 
fcontent/drive/MyDrive/femalenord/wave/cewOzb O00a0946 l.wav|no. and i'm not sure i'm better off for it no 
fcontent/drive/MyDrive/temalenord/wava/cwOzb O00a0de0 l.wav| what in the nine holds is that? 
fcontent/drive/MyDrive/temalenord/wave/cewleb O00a0des 1l.wav|but there isn't any other way through. 
fcontent/drive/MyDrive/temalenord/wave/ewlzb O00a0dt3s l.wav|oh... i've heard of this. they say these wall 
fcontent/drive/MyDrive/temalenord/ wave/ewl2b wal03bsoaldierblockin O0059eed 1l.wav| you must be the new unbloa 
fcontent/drive/Mybrive/tfemalenord/ wave/cewl2b mql03bsoaldierblockin O0059eee 1.wav| you must be the new unbla 
fcontent/drive/MyDrive/tfemalenord/ wave/cewO2b malO3bso0ldierblockin O0059eef 1i.wav|i don't know what we're d 
fcontent/drive/MyDrive/femalenord/wave/cwl2b mgl03bsoldierblockin O0059ef0 i.wav|i keep thinking i see som 
fcontent/drive/MyDrive/femalenord/wave/cewl2eb mgl03bsoldierblockin O0059ef1 1.wav|i hope we don't run into 
fcontent/drive/MyDrive/femalenord/wave/cwO2b malO3bsoldierblockin O0059ef2 1l.wav|what are we waiting for? 
fcontent/drive/MyDrive/tfemalenord/wave/cwO2b mqlO3bsoldierblockin 00058006 l.wav|keep a sharp eye out. 
fcontent/drive/MyDrive/tfemalenord/wave/cewOzb mqlO3bsoaldierblockin O00a094e i.wavlat least we got the damn 
fcontent/drive/MyDrive/temalenord/wave/cwlzbh malO3bsoldierblockin O00a094f l.wav|i wish galmar would hurry 
fcontent/drive/MyDrive/temalenord/wava/cwattackei cwattackcityblo O00208Ge 1l.wav|they will sing stories of 
fcontent/drive/Mybrive/temalenord/wava/cwattackcity OO00lazed 1.wav| aye! 
fcontent/drive/MyDrive/temalenord/ wave/cwudialogue cwdialogquehello 00020956 1.wav|i long to be out there, w 
focontent/drive/MyDrive/temalenord/ wave/cewdialogue cwdialoquehello 00020958 1.wav| legion soldiers gleam lik 
fcontent/drive/Mybrive/tfemalenord/ wave/cwdialogue cwdialogquehello O0O02095b 1.wav|i'll fight the entire leg 
fcontent/drive/Mybrive/tfemalenord/ wava/cewdialoguesoldiersOl O006b139 1i.wav|hope i don't draw guard duty. 
fcontent/drive/MyDrive/tfemalenord/ wava/cewdialoguesoldiersO1l O006bi3a_l.wav|i bet we have rabbit again. i! 
fcontent/drive/MyDrive/femalenord/wave/cwdialoguesoldiersO1 O006b1i3b 1.wav|we're running out of bread. i 
fcontent/drive/MyDrive/temalenord/wave/cwdialoguesoldiersOl O006b145 1.wav| you'd rather be plowing a fiel 
fcontent/drive/MyDrive/temalenord/wave/cwdialoguesoldiersOl O006b146 l.wav|go tell the commander. i'm sur 


fcontent/drive/MyDrive/femalenord, wavs/cwdialoquesoldiersO1 00066147 1.wav| you must be all right if you'v ™ 
> 


mg 


Morrmal text file length: 302,237 lines: 2,228 Ln: 1 Col:1 Sel: 5,853 | 50 Windows (2RLF) UTF-8 





Figure 19 - 50 lines highlighted. 


Create another text file and name it “#_validation.txt’ where ‘#’ is the name of the voice type 
you're training, 1.e.: femalenord. In my example that’s going to be ‘femalenord_validation.txt’. 


Cut the highlighted lines out of the training text file and paste them into the validation text file. 


[ai D\Projects4\Pythonttilelists\femalenord_validation.tet - Wotepad++ 


File Edit Search View Encoding Language Settings Tools Macro Run Plugins Window ? 


a a is a ir | | dh Ag | & 2|\8BB\/S 1 SBRBHa®|o 


Fl dialogueE xport.cev | St femalenord_training. tet | femalenord_ validation. txt £9 | 


fcontent/drive/MyDrive/femalenord/wave/cwOZa  O0038096 l.wav|maybe they're so scared of you they ran away. a 
fcontent/drive/MyDrive/tfemalenord/wava/cwlza  O006f964 l.wav|it's the legion! 
fcontent/drive/MyDrive/femalenord/wave/cewOza  O006f965 1l.wav|imperials! kill them! 
fcontent/drive/MyDrive/tfemalenord/wave/cewlza  O006fbOL l.wav|it's got to be the imperials! find them! 
fcontent/drive/MyDrive/temalenord/wava/cewlga_ OO07641b l.wav|the legion is here! defend the entrance! 
fcontent/drive/MyDrive/temalenord/wavae/cwOZa  O00d6619 1l.wav|just shut up and keep out of sight. 
fcontent/drive/MyDrive/temalenord/ wave/cewlZa_mql03ascldierblockin O0059eec 1.wav|nobody allowed in. ulfric 
focontent/drive/MyDrive/temalenord/wava/ewl2b OO06f970 1l.wav| dann dustmen! why don't you stay dead? 
fcontent/drive/MyDrive/tfemalenord/wava/ewl2b O006fb15 1.wav| look out! draugr! 
fcontent/drive/MyDrive/tfemalenord/wave/ewl2b O00a0948 1l.wavy|no. and i'm not sure i'm better off for it no 
fcontent/drive/MyDrive/femalenord/wave/ew02b O00a0deO 1l.wavy| what in the nine holds is that? 
fcontent/drive/MyDrive/femalenord/wave/cewl2b O00a0des 1l.wav|but there isn't any other way through. 
fcontent/drive/MyDrive/femalenord/wava/cewOzb O00aOdf3 l.wav|oh... i've heard of this. they say these wall 
fcontent/drive/MyDrive/temalenord/wave/cewOzb mqlO3bsoaldierblockin O0059eed 1.wav| you must be the new unblo 
fcontent/drive/MyDrive/tfemalenord/wave/cewOzb mqlO3bsoaldierblockin O0059eee l.wav| you must be the new unblo 
fcontent/drive/MyDrive/temalenord/wave/cwlzb mal03hbsoaldierblockin O0059eef l.wav|i don't know what we're d 
fcontent/drive/MyDrive/temalenord/ wava/cwl2b mql03bsoaldierblockin OO059ef0 l.wav|i keep thinking i see som 
fcontent/drive/MyDrive/temalenord/ wava/cw02b mal03bsoaldierblockin O0059ef1 1.wav|i hope we don't run into 
fcontent/drive/MyDrive/temalenord/ wave/cwO2b mql03bsoaldierblockin O0059ef2 1l.wav|what are we waiting for? 
fcontent/drive/MyDrive/temalenord/wave/cewl2b mal03bso0ldierblockin OO05a086 l.wav|keep a sharp eye out. 
fcontent/drive/MyDrive/tfemalenord/ wave/ewl2b mql03bsoaldierblockin O00a094e il.wavlat least we got the damn 
fcontent/drive/Mybrive/tfemalenord/ wave/cewl2b malO3bso0ldierblockin O00a094f 1.wav|i wish galmar would hurry 
fcontent/drive/MyDrive/tfemalenord/ wavea/cwattackei cwattackcityblo OO0Z088e 1l.wav|they will sing stories of 
fcontent/drive/MyDrive/tfemalenord/wave/cwattackcity O00lazel 1.wav|aye! 
fcontent/drive/MyDrive/temalenord/ wave/cwdialogue cwdialogquehello 00020956 l.wav|i long to be out there, w 
fcontent/drive/MyDrive/temalenord/ wave/cwdialogue cwdialoquehello 00020956 li.wav| legion soldiers gleam lik 


Oo i) 4 oo oe 





Normal text file length: 5,653 lines : 50 Ln: 1 Col: 1 Pos: Windows (CRLF) UTF-8 


Figure 20 - Validation text file. 


Tacotron 2 Speech Synthesis Tutorial 16 


Preparing a dataset using voice acting from The Elder Scrolls V: Skyrim 


Save both your training and validation text files. 


Now that we’ve prepared the training and validation files, we can move on to preparing the 
audio. 


Extract the voice acting from the ‘Skyrim - Voices.bsa’ archive. To do this we'll need a copy of 
BAE, the Bethesda Archive Extractor, which you can download from Nexusmods. 


BA Bethesda Archive Extractor (BS &X + 


O | @ https:/Avww.nexusmods.com/skyrim specialedition/mods/974?tab=description 


fe Skyrim Special Edition v sels Media v Community » Support v LOGIN 


Bethesda Archive Extractor (BSA and BA2) 


BETHESDA ARCHIVE EXTRACTOR (BSA AND BAZ) 
Porn a" Ge @ Ss Be 


Dewnlcad: MANGAL 


loadscreenart farmbwall03.nif 
lod farmbwall04.nif 
magic farmint2wall01 nif 
markers farmintinnwall01.nif 
mps farmintwall01.nif 
plants farmintwoodwall01.nif 

shadertests v M stonewall 

sky stonewall01.nif 

terrain stonewall0livy.nif 

test stonewall02.nif 

traps Vv M walkway 

water walkwaycentwall01.nif 

weapons dad walkwaycentwall02.nif 











Extract 





Figure 21 - BAE on Nexusmods. 


Extract the archive to a folder and run bae.exe. 
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Preparing a dataset using voice acting from The Elder Scrolls V: Skyrim 


Drag and drop the ‘Skyrim - Voices.bsa’ file into BAE to open it. 


baa skyrirn - Voices.bsa - BAAE, 


File About 


Select All Select Mone 


os 





File 
akyrinn - Wolces.b3a 




















Extract 





Figure 22 - Skyrim - Vouces.bsa opened in BAE., 
Click ‘Select AI? to ensure everything is selected, then click Extract. 


bae Select Folder 


-_ ae bo F Projects Python 


skyrirn_woice_ tiles 
Organize ¥ New folder 
Pictures # 
S Downloads 
Mo items match your search, 
my filelists 
Fr Videos 


Fr Wyrrastooth 


Name Date modified 


oO OneDrive 


MM This PC 
WY 3D Objects 
MR Desktop 
iS] Documents 
$ Downloads 
d Music 


Eis -u..... . 


Folder: | skyrirn_voice_files 


Select Folder Cancel 





Figure 23 - Selecting a folder to extract to. 


Select a folder to extract the voice files to, then click on Select Folder. 
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Once BAE has finished extracting everything from the ‘Skyrim - Voices.bsa’ archive, navigate to 
the folder you selected, then go to sound > voice > skyrim.esm. You should see a folder for 
each voice type and these folders should contain .fuz files. 


en Ee 


File Home Share View 


D:\Projects\Pythonskyrinm_voice_files\sound woiceyskyrinn.esir 


sd be P skyrim_woice_ files sound WOICE skyrim.esm 


Quick access 
my Desktop 
Downloads 
B Documents 
BS Pictures 
F Downloads 
By filelists 
Mm videos 
F Wiyrenistooth 
OneOrive 
me This PC 
My Desktop 
B Documents 
ame, Local Disk (Cs) 


we |oacal Mick "4 


103 iterns | 


Name 


M, crdogvoice 

@ crdragonpriesteoice 
@ crdragonvoice 

B crdraugreoice 


F Crdremoravoice 


B crhagravenvoice 


@ cruniguealduin 

Oy cruniqueodahviing 
B® cruniquepaarthurnax 
©, fernaleargonian 

B® fernalechild 

By, fernalecommander 


F femalecommoner 


© fernalecondescending 


F femalecoward 
F femaledarkelf 


Figure 24 - Extracted voice type folders. 


We'll need to convert those .fuz files to .wav next. 
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Date modified 


1704/2021 9.56 PM 
1/04/2021 9.56 PM 
1/04/2021 9.56 PM 
1704/2021 9.56 PM 
1/04/2021 9.56 PM 
1704/2021 9.55 PM 
1/04/2021 9.56 PM 
1/04/2021 9.56 PM 
1704/2021 9.55 PM 
1/04/2021 9.56 PM 
1704/2021 9.56PM 
1/04/2021 9.56 PM 
1/04/2021 9.56 PM 
1704/2021 9.56 PM 
1/04/2021 9.56 PM 
1704/2021 9.56PM 


Type 


File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 





Preparing a dataset using voice acting from The Elder Scrolls V: Skyrim 


Go ahead and download Unfuzer from Nexusmods. 


Extract the archive to a folder and run unfuzer.exe. 


¥ Unfuzer C++ Edition v1.5 


Folder to parse: 


Ds \Projects\Pythoniskyrim_voice_files\soundwoicesskyrim.esm 


Parse subfolders [ |Recursively 


Wrfuz (Fuz-> swine Say) [ |Process . lio files Retuz (ay->xwim->Fuz) 





Figure 25 - Unfuzer. 


Click on the file path beneath “Folder to parse’. Navigate to the folder where you extracted the 
voice files to, go to sound > voice and select the ‘skyrim.esm’ folder. 


Tick “Parse subfolders’, make sure ‘Process .lip Files’ isn’t ticked, then click Unfuz (Fuz->Xwm- 
>Wav). This will convert every .fuz file in the skyrim.esm subfolders to -wav. 


Note: Converting all the .fuz files to .wav will take some time. 
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In the ‘skyrim.esm’ folder create a new text file. Rename that file ‘delete_fuz.bat’ 


File Home 


-a i 


Share View Application Toals 


skyrim_woice_ files sound WOICE 


Mame 
aw Quick access 
MY Desktop 
.| Downloads 
B Documents 
Pictures 
E Downloads 
BB filelists 
FE Videos 


F VWyrenistooth 


FE mialeuniquermercertrey 


F maleuniquemgaugur 
F maleuniquemolagbal 


a maleuniquenazir 

F maleuniqueperyite 

F maleuniqueseptirius 

F maleuniquesheagarath 

BE maleuniguetullius 

Fr maleuniqueultric 

a malewarlock 

F. maleyoungeager 

F. specialfernaleuniquegormiaith 


i OneDrive 


MS This PC 
my Desktop 
Documents 
‘am, Local Disk (Cs) EM delete_fuz.bat 


F specialmaleuniquetelldir 
F specialmaleuniquehakon 
a specialmaleuniquetsun 


ae | ocal Mick 
104 items | 1 item selected 19 bytes | 


skyrim.esm 


Date modified 

1A 2027 S56 PM 
1/04/2021 9:56 PM 
1/04/2027 9:56 PM 
1/04/2027 O56 PM 
1/04/2021 9:56 PM 
1/04/2027 9:56 PM 
1/04/2027 9:56 PM 
1/04/2021 9:56 PM 
1/04 2027 O56 PM 
1/04/2021 9:56 PM 
1/04/2027 9:56 PM 
1/04/2027 9:55 PM 
1/04/2021 9:56 PM 
1/04 2027 O56 PM 
1/04/2027 9:55 PM 
2/04/2027 1:42 PM 


Figure 26 - Created delete_fuz. bat. 


Open the batch file in Notepad and enter the following commands: 


del jS *.i2£z 
pause 


| delete fuz.bat - Notepad 


File Edit Format 


del #5 *.fuz 
pause 


View Help 


Ds\Projects\Pythontskyrim_voice_files\sound\woicess.. 


Type 

File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 
File folder 


Windows Batch File 








Ln 2, Col 6 100 


Windows (CRLFA 


LITF-3 
Figure 27 - delete_fuz. bat contents. 


Save and close Notepad then run the batch file. This will delete all .fuz files from the subfolders 
leaving only the .wav files behind. 
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We'll need to create a project directory for this next part. I just made a folder called Datasets but 
you can name it whatever you like. 


F | mm = | OS Projects\Python\Datasets 
File Home Share Wie 


id be F Local Disk (Ds) Projects Python Datasets 


Mame . Date modified 


a Quick access 

MY Desktop P eee ee 
4 Downloads 

B Dacuments 

Pictures 

B Downloads 

Dy filelists 

Oe videos 


mm Vyrrmstooth 
 OneDrive 


MM This PC 
MM Desktop 
Documents 
jam, Local Disk (C4 


ae | oacal Mick fM" 


Oiterns | 





Figure 28 - New project directory for setting up our dataset. 


Copy the folder containing the .wav files for your chosen voice type over to your project folder. 
In my example [ll be copying the femalenord folder from the “‘skyrim.esm’ folder over to the 
Datasets folder I just created. 
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Once you’ve copied it to the project folder, append ‘_base’ to the folder name. 


F | = D:\Projects\Python' Datasets 


=| 
File Home Share View Music Tools 


w af il Local Disk (Ds) Projects Python 


Mame - Date modified Type 


wa Quick access 
M Desktop 
aE Downloads 
A Documents 
Pictures 
BW Downloads 
Bh filelists 
videos 


m Vyrrmstooth 


© fernalenord_base 2/04 e027 1:55 Ph File folder 


o& OneDrive 


MM This PC 
MH Desktop 
Docurnents 
jam, Local Disk (C4 


ae | ocal Mick 


litem | 1 itern selected | 





Figure 29 - Copied the unprocessed audio files to the project folder. 


Tacotron and WaveGlow will require audio files to be formatted in a specific way, so in this 
section Pll be showing you how to batch process the audio files for your dataset. In order to do 
so we'll need to download two utilities: SoX and ffmpeg. 


can be downloaded from its page on Sourceforge: http://sox.sourceforge.net/. 


SoX - Sound eXchange | HomePage X ae 


C a 4 sox.sourceforge.net 


_SoX - Sound eXchange | HomePage tsi sanits 





Welcome to the home of Sox, the Swiss $ sox tracki,wav tracki-processed,flac remix - norm -3 highpass 22 


Home Page gain -3 rate 48k norm -3 dither 
Features Army knife of sound processing programs. 
ety Input File t ‘'tracki,wav' 
Channels 32 

j = j j Sample Rate t 44100 
Sox is a cross-platform (Windows, Linux, | Bree izion : 46-bit 
MacOs X, etc.) command line utility that can | Duration ¢ 00302354,97 = 7716324 samples = 13123 CDDA sectors 

, ; Sample Encoding: 16-bit Signed Integer PCM 

convert various formats of computer audio — Endian Type : little 


FAQ 


Documentation 


Mailing Lists 
Links 





files in to other formats. It can also apply Output File 


various effects to these sound files, and, as | Channels : 
Sample Rate $ 48000 
an added bonus, SoX can play and record Precision : 16-bit 


: Duration $ 00302354,97 = 8398720 samples “ 13123 CDDA sectors 
audio files on most platforms. Sample Encoding: 16-bit FLAC 


t ‘'tracki-processed,flac' 
1 

Project Page 

B Prowse g j if 


& Support this. 
¢ Pp r oj i=] ct 


channels bits Cmulti>d 
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7 :, 7 h . sox? effects chain? input 44100Hz 2 
- j sox? effects chain? remix 44100Hz 2 
The screen-snot to t ile tT snows an sox? effects chain? norm 4$4100Hz 1 channels bits 
example of Sox first being used to process | sox effects chaint highpass 44100Hz 1 channels 16 bits 
sox? effects chain? gain 44100Hz 1 channels bits Cmultid 
4 
4 
4 
1 





some audio, then being used to play some | sox: effects chain: rate 44100Hz 1 channels bits 


audio files soxt effects chaint norm 43000Hz channels bits 
‘ soxt effects chaint dither 48000Hz channels bits 
sox? effects chain? output 48000Hz channels bits (multi) 


SOURCEFORGE For the list of all file formats, device drivers, ¢ play *.ogg v 








Figure 30 - SoX website. 
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Once downloaded, extract the archive to an easily accessible location. On my end I just extracted 
SoX to a new folder in Program Files (x86). 


ffmpeg can be downloaded from its website. Again, once you’ve downloaded it, extract the files 
to an easily accessible location. On my end I just extracted ffmpeg to a new folder in Program 


Files (x86). 


Download FFrnpeg 2 i 


< GC ¢@ SEM Pan erAliiil *l-1¢ holes 


| as mpeg 


= Get packages & executable files 


FFmpeg only provides source code. Below are some links that provide it already compiled and 
g-t- lee kemelen 


ra) | 


Windows EXE Files 





Figure 31 - ffmpeg website. 


Firstly, if you’re copying audio files from Oblivion, or from a game where the audio files aren’t in 
.wav format, we'll need to convert them to .wav first. 


Create a new batch file and name it ‘O_mp3_to_wav.bat’. Open it in Notepad and enter in the 
following commands: 


cd "D:\Projects\Python\Datasets\femalenord base\" 


for $a in (*.mp3) do "C:\Program Files (x86) \ffmpeg\bin\ffmpeg.exe" 
-1 "Sa" "SS~na.wav" 


rE | Qornips_to_way.bat - Notepad 


File Edit Format ‘ew Help 

kd “Dr \Projects \PythonDatasets\temalenord base" 

for 4a in (*.mp3) do "C:\Program Files (x@6)\ffmpep\bin\ffmpeg.exe” -i "Kea" "Kxena.mwav" 
pause 





Ln, Col 1 100 = Windows (CRLF) LITF-8 


Figure 32 - O_mp3_to_wav. bat. 


Note: You'll need to change the folder paths to match your environment. In this example ’m converting 
mp3 files to -wav. If your files are already in .wav format you won’t need to do this. 
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Save and close Notepad then run ‘O_mp3_to_wav.bat’ and wait for it to complete. The -wav files 
will be outputted back to the folder containing the original files. 


Now we need to trim silence from our .wav files. To do this, create a new batch file and name it 
‘1_trim_silence.bat’. Open it in Notepad and enter in the following commands: 


if not exist "D:\Projects\Python\Datasets\output 1" mkdir 
D:\Projects\Python\Datasets\output 1 
cd "D:\Projects\Python\Datasets\femalenord base" 
FOR SSF IN (*.wav) DO "C:\Program Files (x86) \sox-14-4-2\sox.exe" 
SF "D:\Projects\Python\Datasets\output 1\%%~nxF"% 

Silence 1 0.1 1% reverse silence 1 0.1 1% reverse silence 1 0.1 1% 
reverse 
pause 


bs 


File Edit Format ‘View Help 

1f not exist “D:\Projects\Python\ Datasets output 1" mkdir D:\Projects\Python\Datasets out 
cd “D:\Projects\Python\Datasets\temalenord base" 

FOR 386F IN (*. wav) DO "C:\Program Files (x#6)\sox-14-4-2\sox.exe" HEF "Di \Projects\Python 
sllence 1 @.1 1% reverse silence 1 @.1 1% reverse silence 1 @.1 1% reverse 

pause 





Ln, Cal 1 1006 = Windows (CRLF) LITF-8 


Figure 33 - 1_trim_silence. bat. 


Note: You'll need to change the folder paths to match your environment. We'll be reading -wav files 
from the #_base folder and outputting to a folder called output_1. 


Save and close Notepad then run “1_trim_silence.bat’ and wait for it to complete. 
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Next we'll need to add some silence back to the end of each .wav file. In order to do this, create 
a new batch file and name it ‘2_add_silence.bat’. Open it in Notepad and enter in the following 
commands: 


if not exist "D:\Projects\Python\Datasets\output 2" mkdir 
D:\Projects\Python\Datasets\output 2 

cd "D:\Projects\Python\Datasets\output 1\" 

FOR SSF IN (*.wav) DO "C:\Program Files (x86) \sox-14-4-2\sox.exe" 
SsEF "D:\Projects\Python\Datasets\output 2\%%~nxF" pad 0 0.1 

pause 


| ¢_add_silence. bat - Notepad Ol 4 


File Edit Format ‘lew Help 

lf not exist “D:i4ProjectsPython\Datasets output 2" mkdir DO: \Projects\Python\Datasets yout 
cd “D:\Projects\Python\Datasets output 14" 

FOR 3F IN (*.wav) DO "C:\Program Files (x@6)\sox-14-4-2\sox.exe" HAF "Di \Projects \Pythor 
pause 





Li4, Col 6 100 = Windows (CRLF) LITF-8 


Figure 34 - 2_add_silence. bat. 


Note: You'll need to change the folder paths to match your environment. We'll be reading -wav files 
from the output_1 folder and outputting to a folder called output_2. 


Save and close Notepad then run ‘2_add_silence.bat’ and wait for it to complete. 
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We'll need to downsample the audio to 22050Hz. Voice acting in most Bethesda Game Studios 
titles is recorded at 44100Hz. Create a new batch file and name it “3_downsample.bat’. Open it in 
Notepad and enter in the following commands: 


if not exist "D:\Projects\Python\Datasets\output 3" mkdir 
D:\Projects\Python\Datasets\output 3 

cd "D:\Projects\Python\Datasets\output 2\" 

FOR SSF IN (*.wav) DO "C:\Program Files (x86) \sox-14-4-2\sox.exe" 
SsEF "D:\Projects\Python\Datasets\output 3\%%~nxF" rate -v 22050 
pause 


a _downsample. bat - Notepa 
“| 4d ple.bat - Notepad g x 


File Edit Format ‘lew Help 

lf not exist “D:i4ProjectsPython\Datasets output 3" mkdir D:\Projects\Python\Datasets out 
cd “D:\Projects\Python\Datasets output 24" 

FOR 33F IN (*.wav) DO "C:\Program Files (x@6)\sox-14-4-2\sox.exe" HAF "Di \Projects \Pythor 
pause 
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Figure 35 - 3_downsample. bat. 


Note: You'll need to change the folder paths to match your environment. We'll be reading -wav files 
from the output_2 folder and outputting to a folder called output_3. 


Save and close Notepad then run ‘3_downsample.bat’ and wait for it to complete. 
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Lastly we need to make sure all our .wav files are in mono. Create a new batch file and name it 
‘4_mono.bat’. Open it in Notepad and enter in the following commands: 


if not exist "D:\Projects\Python\Datasets\output 4" mkdir 
D:\Projects\Python\Datasets\output 4 

cd "D:\Projects\Python\Datasets\output 3\" 

for 63S in (*.wav) do "C:\Program Files (x86) \ffmpeg\bin\ffmpeg.exe" 
-i "S$%3S" -ac 1 "D:\Projects\Python\Datasets\output 4\%%S" 

pause 


| 4 mono.bat - Notepad U ae 


File Edit Format ‘flew Help 

if not exist “D:i4ProjectsPython\Datasets output 4° mkdir DO: \Projects\Python\Datasets out 
cd “D:\Projects\Python\Datasets output 34" 

for #45 in (*.wavj doa "C:\Program Files (xé6)\ffmpeg\binitfmpeg.exe” -i "RES" -ac 1 “D:\F 
pause 


Ln4, Cal 6 100 | Windows (CRLF) LITF-8 





Figure 36 - 4_mono. bat. 


Note: You'll need to change the folder paths to match your environment. We'll be reading -wav files 
from the output_3 folder and outputting to a folder called output_4. 


Save and close Notepad then run ‘4_mono.bat’ and wait for it to complete. 
P P 
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Rename the folder output_4 to match the name of the voice type you're setting up a dataset for. 
In my example, that will be femalenord. 


F | F 7 O\Projects\Python'Datasets 
File Home Share VWiew Music Tools 


ad be F Local Disk (Dy) Projects Python 


ETT - Date modified Type 


w Quick access 


F. fernalenord 2704/2021 3:40 PM File folder 
my Desktop 


a fernmalenord_base 2/04/2027 3:29 PM File folder 

% Downloads BD osutputt 2/04/2021 3.31 PM File folder 
ee BB cutput 2 2/04/2021 3:34 PM File folder 

Pictures 2 | output_3 efe ede] 3:37 PR File folder 

E Downloads > po _to_wavbat 2042027 2:15 PM Windows Batch File 
BF filelists EM 1 _trirm_silence.bat 2/04/2021 3:29 PM Windows Batch File 
Bh Videos Ed 2 add silence.bat 20/02/2021 2:15 PM Windows Batch File 
ne ES 3 downsarnple bat 20/02/2027 2:21 PM Windows Batch File 
|} Wyrmistooth = ee naa acer ae 
4 mono.bat 20/02/2021 2:32 PM Windows Batch File 


& Gnelrive 


MS This PC 
my Desktop 
B Documents 
wm, Local Disk (Cc) 


we | ocal Mick ("4 


1Oitemns | 1iternselected | 





Figure 37 - Renamed the output_4 folder. 


Create a new folder named ‘wavs’ within it, then move all the .wav files into the ‘wavs’ folder. 


F | F = D\Projects\Python'Datasets\femalenard 


File Home Share Views Music Tools 
db |} Projects Python Jatasets >» fermalenord 


ier ; a . i Contributing artists | Alburn 
4 Downloads 3 r cam: 

=| Documents 

Pictures 

BB Datasets 

F rr 


Bn 


F Tacotron Tutorial 


ie OneDrive 

MS This PC 
Mm Desktop 
Docurnents 
jm, Local Disk (Cy) 
am, Local Disk (Ds) 


litem | 





Figure 38 - wavs folder. 


Go to your Google Drive: https://drive.google.com/. 
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Create a new folder in Google Drive called ‘filelists’, then drag-and-drop the training and 
validation text files into your browser to upload them. In my example that’s going to be the 
femalenord_training.txt and femalenord_validation.txt files. 


LE Search in Drive 


My Drive > filelists 


Hame T Last modified 
My Drive 


§  femalenord_training.txt 
Shared with me 


Recent §  femalenord_validation.bt 5:58 PM me 


Starred 


Trash 


Storage 





Figure 39 - Uploading the training and validation files. 


Drag-and-drop the folder containing the processed .wav files into your browser to the root of 
yout Google Drive directory to upload them. In my example that’s going to be the femalenord 
folder. 


Q search in Drive 


My Drive 


6° Tacotron Gustom = waveglow_femaleargonian 


Ny Drive You edited today You edited taday 


Shared with me 
Hame ‘tT Last modified 
Recent 


“)) Colab Notebooks Mar 13, 2021 me 
Starred 


Trash Be ofemalenord 6:04PM me 


Ge flelists Nov 14, 2020 me 
Storage 


BE tacotron?_checkpoints Nov 10, 2020 me 





Figure 40 - Uploading the processed audio files. 
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MOUNTING YOUR GOOGLE DRIVE STORAGE 
IN GOOGLE COLAB 


As mentioned earlier, Pve already set up a Colab Notebook with all the commands you'll need to 


run. You can access it here: 


https: / /colab.research.google.com/drive/13vRLNPLgVWGjgHGuUiJKxXuZw93PUBBreusp= 
sharing. 











O x 
C ila > 4 > J > [’ > a | . a | > [’ D £ hs “ ia “ f > i] > ¥ ies h ies h ies ,' o 
< C  @@ colab.research.google.com/drive/1 3vRLNPLqVWGjgH GUUUKxXuZw93PUBBréscrollTo =RXj6ccfExD3E e “¢ 2% = & 
© Tacotron a 
Ej comment 8 Share @& £3 
File Edit View Insert Runtime Tools Help Lastsaved at 14:50 
+ Code + Text Connect + y Editine A 
E Table of contents x : f : 
Denoise 
Q 
Tacotron2 Training ~ First Things First 
<> Set Model Name 
- Configure hparams.py ~ Mount Google Drive 
Cald Start 
Warm start [ ] from google.colab import drive 
| Resume Checkpoint drive.mount(¢'/content/drive') 
WaveGlow Training Mounted at /content/drive 
Setup WaveGlow 
Download Default WaveGlow Model ~ Check GPU 
Set Model Name 
Set Things Up lnvidia-smi -L 
(=) . d bets A540 eens 
Diin Thic If Vvlarm Ctart Training 
x 
Figure 41 - Colab Notebook. 
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The first thing we need to do is go up to Runtime > Change Runtime Type. 


‘|i a@etie@etie@er Pa | etl are 


@ colab.research.google.com/drive/1 3vRLNPLqVWGjgH GUUUKxXuZw93PUBBréscrollTo =RXj6ccfExD3E 


Notebook settings 


Hardware accelerator 


GPU v (@) 


To get the most out of Colab, avoid using 
a GPU unless you need one. Learn more 


CT] Omit code cell output when saving this notebook 


CANCEL SAVE 





Figure 42 - Notebook settings. 


Make sure ‘Hardware accelerator’ is set to GPU then click Save. 
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<> Set Model Name 
5 Configure hparams.py ~ Mount Google Drive 
Cold Start 
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Figure 43 - Clicking Connect. 


Click Connect near the top-right to start a new Colab session. 
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It’s important to note that sessions are temporary. Anything we install or save to our session will 
be lost once the session is terminated. We can terminate sessions manually by clicking on the 
down arrow next to the Connect button, selecting Manage Sessions, then click Terminate next to 
our active session. 


@ colab.research.google.com/drive/1 3vRLNPLgaVWIGjgH GUUUKxXuZw9 3PUBBréscrollTo =RXj6ccfExD3E 


Active sessions 


Title Last execution RAM used 


Tacotron 


Current session 


0 minutes ago 0.91 GB 





Figure 44 - Active session. 


If Terminate is greyed out, just refresh the Colab tab in your browser then try again. 


Important: We’re provided free access to a physical GPU. If a GPU session 1s left idle it will terminate 


automatically and you will need to reconnect. 


It’s also important to note that sessions will only persist for a few hours, depending on the overall system 
usage of the Google Colab service. 


Because sessions are temporary we'll need to connect our Colab session to our Google Drive 
storage so we can both access our dataset and save the checkpoints and log files from Tacotron 
and WaveGlow back to it. 
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First, a bit about Colab Notebooks. Colab Notebooks are made up of two types of cells. Text 
Cells and Code Cells. 


& Tacotron 
File Edit View Insert Runtime Tools Help Lastsaved at 12:34 


Ej comment 8 Share & AZ 


+ Code + Text Reconnect ~» y Editinc A 
Table of contents x 4 g 


Load Tacotron2 Model 


Q Load Waveglow Model First Things First 


Text Input 
<> 


Generate Mel Outputs 
0 Synthesize Audio ~ Mount Google Drive 


Denoise 





[ ] from google.colab import drive 


Tacotron2 Trainin 
| 9 drive.mount¢*/content/drive’) 


Set Model Name 


. Mounted at /content/drive 
Configure hparams.py 


Figure 45 - A text cell. 
Text Cells are used to add labels. 
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Load Tacotron2 Model 
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Text Input 
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= Synthesize Audio ~ Mount Google Drive 
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| Tacotron2 Training [ ] from google.colab import drive 


drive.mount(¢'/content/drive’) 


Set Model Name 
Mounted at /content/drive 





Configure hparams.py 


Figure 46 - A code cell. 
Code Cells are used to run commands or python code. 
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Set Model Name 
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Configure hparams.py 


Figure 47 - Run cell button. 


To execute a code cell, click on the Run Cell button. 
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Mounting your Google Drive storage in Google Colab 


Run the code cell under Mount Google Drive. 








a) x 
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Download Default Tacotron Model @ from google.colab import drive 
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Generate Audio ————omsE 
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Load Tacotron2 Model 
Load Waveglow Model 
(=) Text Input ~ Check GPU 
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Figure 48 - Mount Google Drive. 





We'll need to verify our connection before we can link our Google Drive storage to our Colab 
session. Click on the link produced in the cell. 


C @ accounts.google.com/signin/oauth/consent?authuser=0&part=AJiBhAPJgdR80oW-VKNQmZc6l9MS... 





Drive = 


@ = See edit, create, and delete any of your Google G) 
Drive documents 


Make sure you trust Google Drive for desktop 


You may be sharing sensitive info with this site or app. 
Learn about how Google Drive for desktop will handle your 
data by reviewing its terms of service and 


your Google Account. 


Learn about the risks 


Cancel Allow 


English (United States) + Help Privacy Terms 





Figure 49 - Google Drive verification. 
Click Allow. 
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Mounting your Google Drive storage in Google Colab 


{ CO} 7 S > 4 > J > [ > 


CG @ accounts.google.com/o/oauth2/approvalAv2/approvalnativeapp?auto=false&response=code%3D4.., 





Google 
Sign In 


Please copy this code, switch to your application and paste it there: 


4/1A¥Ge- 1D 
g4k90 yk KbObevqaQT3CbjWF9ykppAHdtGIPRsoZsIN6él + 


Figure 50 - Authentication code. 


Copy the authentication code and paste it into the text field back in the code cell like so, then 


press Enter. 


Oo x 
o 


c Ya AIG 


€ CG @ colab.research.google.com/drive/1 3vRLNPLgVWGjgH GUUUKxXuZw93PUBBr¥scrollTo =MiRghEVI nw... 





& Tacotron 
Ej comment 5 Share @& AP 
File Edit View Insert Runtime Tools Help All changes saved 





Anrles 7 RAM > 
+ Code + Text Oe ons ¥ Editinc A 
|= Table of contents x . Disk f 2 
Q First Things First 
Fount Google bite ~ First Things First 
$e Check GPU 
p __ Install Tacotron2 ~ Mount Google Drive 
setup Tacotron2 ~ Joan f a 
Download Default Tacotron Model © from google.colab import drive 


bdrive.mount(*/content/drive') 


Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client 
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Setup hparams 
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Figure 51 - Entered the authentication code. 
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O x 
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& Tacotron 
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Figure 52 - Successfully mounted Google Drive storage. 


Once it’s mounted successfully, the cell will report ‘Mounted at /content/drive’. 


Run the code cell under Check GPU. 
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Figure 53 - Check GPU. 
This will tell us what kind of GPU has been assigned to our session. For P4 and K80 GPUs you 


may need to lower the batch_size settings later on if you run into an ‘Out of Memory’ error 
when you start training. 
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TRAINING A TACOTRON MODEL IN GOOGLE 
COLAB 


Scroll down to the section labelled Install Tacotron 2. 


Run the code cell under Setup Tacotron 2. This will install and Tacotron 2 and its dependencies 
to your Colab session. 
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Load Waveglow Model # Install dependencies 
[pip install tensorflow==1.15.2 
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¥ Os completed at 15:11 @ x 


Figure 54 - Installing Tacotron 2. 
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Training a Tacotron model in Google Colab 


Run the code cell under Download Default Tacotron Model. This will download the default 
Tacotron model that we'll be using to warm-start our own model from. 
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Figure 55 - Download default Tacotron model. 
Scroll down to Tacotron 2 Training. 


In the code cell beneath Set Model Name, change ‘tacotron_femalenord’ to match the voice 
type of the dataset you’ve prepared. 
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Download Default WaveGlow Model 

Set Model Name [ ] # Training list file. 
Set Things Up ised -i -- ‘s,filelists/ljs audio text_train_filelist.txt, /content/drive/myDr 
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¥Y Os completed at 15:11 @ x 


Figure 56 - Set model name. 
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Training a Tacotron model in Google Colab 


By default Tacotron saves individual checkpoints as it trains as ‘checkpoint_{}’. Removing ‘{}’ 
from the model name tells Tacotron to just keep one checkpoint and overwrite it when it needs 
to save a new one. 


Google Drive is limited to 15GB of storage space and this can be used up pretty quickly when 
training a Tacotron or WaveGlow model if we save checkpoints individually. 


Click Run Cell once you’ve changed the model name. 


Scroll down to Configure hparams.py. 
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Set Model Name lsed -i -- ‘s,iters per _checkpoint=1606,iters per _checkpoint=2086,g" ‘/content 
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¥Y OS completed at 15:11 @ x 





Figure 57 - Configure hparams.py. 


Set the path to your training and validation text files. By default, I have the training file path set 
to ‘/content/drive/MyDrive/filelists/femalenord_training.txt’ and the validation file path set to 
‘/content/drive/MyDrive/ filelists/femalenord_validation.txt’, 


The other settings can be left as is for now. 


Click Run Cell once you’ve changed the file names for the training and validation files. 
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Training a Tacotron model in Google Colab 


Scroll down to Warm Start. 
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¥Y Os completed at 15:11 @ x 
Figure 58 - Warm Start. 


Warm Start allows us to begin training a new model off the default Tacotron model. This is 
useful for small datasets like the ones we make from Skyrim or Oblivion. 


Models will be saved to a new folder in Google Drive called ‘tacotron2_checkpoints’. The log 
directory will be saved there as well. 


Click Run Cell to begin training a new model. 


Assuming you don’t run into any errors, you should see a new model in the 
‘tacotron2_checkpoints’ folder after 200 iterations. Every 200 iterations your progress will be 
saved, overwriting the existing checkpoint. 
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Training a Tacotron model in Google Colab 


To resume training an existing model, scroll down to Resume Checkpoint. 


Note: You'll need to make sure you’ve run Setup Tacotron2, Set Model Name and Configure 
hparams.py first before resuming. 
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¥Y OS completed at 15:11 @ x 
Figure 59 - Resume Checkpoint. 


For the -c switch, change the name of the model and ensure the path to it is correct. By default I 
have this switch set to ‘/content/drive/MyDrive/tacotron2_checkpoints/tacotron_femalenord’. 


Click Run Cell to resume training a model. 
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Training a WaveGlow model in Google Colab 


TRAINING A WAVEGLOW MODEL IN 
GOOGLE COLAB 


Scroll down to the section labelled Install WaveGlow. 


Run the code cell under Setup WaveGlow. 
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¥ Os completed at 15:11 @ x 


Figure 60 - Setup WaveGlow. 
Run the code cell under Download Default WaveGlow Model. This will download the default 


WaveGlow model that we'll be using to warm-start our own model from. 
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¥ Os completed at 15:11 @ x 


Figure 61 - Download Default WaveGlowModel. 
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Training a WaveGlow model in Google Colab 


In the code cell beneath Set Model Name, change ‘waveglow_femalenord’ to match the voice 
type of the dataset you’ve prepared. 
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Warm Start ; 

| ~ Set Things Up 
Resume Checkpoint 

WaveGlow Training 

[ ] # No Apex. 
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Figure 62 - Set Model Name. 


By default WaveGlow saves individual checkpoints as it trains as ‘waveglow_{}’. Removing ‘{}? 


from the model name tells WaveGlow to just keep one checkpoint and overwrite it when it 
needs to save a new one. 


Note: Google Drive is limited to 15 GB of storage space and this can be used up pretty quickly when 
training a Tacotron or WaveGlow model if we save checkpoints individually. 


Click Run Cell once you’ve changed the model name. 
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Training a WaveGlow model in Google Colab 


Scroll down to Set Things Up. 
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Resume Checkpoint 
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Download Default WaveGlow Model # Save log files for Tensorbord. 


Set Model Name ised -i -- ‘s,"with_tensorboard": false,"“with_tensorboard": true,g* ‘/content 
| Set Things Up # Make sure the text files contain the full paths to the .wav files. 
Run This If Warm Start Training acd f/content /waveglow/ 


=) ised -i -- ‘s,wavs/,/content/drive/MyDrive/femalenord/wavs/,g* *“/content/wave 
Run This tf Resuming Training 





¥ Os completed at 15:11 @ x 
Figure 63 - Set Things Up. 


Set the path to the folder containing your .wav files in the areas highlighted in the screenshot 
above. 


Click Run Cell once you’ve changed those paths. 


We won't need the training and validation files we made earlier for ‘Tacotron. This cell will create 
its own train_files.txt file for WaveGlow to use. It will also tell WaveGlow to save checkpoints to 
the ‘waveglow_checkpoints’ folder back in Google Drive. The log directory will be saved there 


as well. 
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Training a WaveGlow model in Google Colab 


If you’re training a new model, run the first two code cells under Run This If Warm Starting. 


O x 


_ Eo @ colab.research.google.com/drive/1 3vRLNPLgVWGjgH GUUUKxXuZw93PUBBriscrollTo =bH qr6M Ved TFD W 





& Tacotron Ej comment 2% Share & $B 


File Edit View Insert Runtime Tools Help All changes saved 





|= * hic ofcantunis x + Code + Text AS iit ¥ é Editing A 
Tacotron2 Training ~ Run This If Warm Start Training 
Set Model Name 
<> Configure hparams.py [ ] # waveglow_2S56channels ljs_v2.pt needs to be converted first because it was t 


*%cd fcontent /waveglow/ 


Cold Start 
Ipython3 -W ignore convert_model.py ‘/content/waveglow/waveglow 256channels_1 


Warm Start 


Resume Checkpoint [ ] ‘sed -i -- 's,"checkpoint_path": "","checkpoint_path": “/content/waveglow/wav 
WaveGlow Training 

Setup WaveGlow [ ] # Modify train.py to start iteration at @ because waveglow_256channels_univer 

#i!sed -i -- ‘s,iteration = checkpoint_dict[‘iteration’],iteration = 8,g" ‘/co 

Download Default WaveGlow Model #!sed -i -- ‘s,optimizer.load state _dict(checkpoint_dict[ ‘optimizer ']),#optim 


Set Model Name ; ; a. 
sed: -e expression #1, char 11: unterminated ~s* command 


Set Things Up 


= Run This If Warm Start Training ~ Run This If Resuming Training 


Run This If Resuming Training 





¥Y Os completed at 15:11 @ x 


Figure 64 - Run This If Warm Starting. 


This will convert the default WaveGlow model ‘waveglow_256channels_ljs_v2.pt’ to allow us to 
warm start from it. 


Click on Browse on the left hand side. Navigate to content > waveglow and open train.py. 











a) x 
€ C  @ colab.research.google.com/drive/1 3vRLNPLqVWGjgH GUUUKxXuZw93PUBBriétscrollTo = gREVVADcSCi0 Ww Y z © 
& Tacotron 
El comment 2% Share @& ABP 
File Edit View Insert Runtime Tools Help All changes saved 
Ando 7 RAM > 
+ Code + Text i nie Editing A 
= Files x Disk 7 
[7] lsed -i -- ‘s,"with_tensorboard": false,"“with_tensorboard": true,g* ‘/content 
Q Cec B 
# Make sure the text files contain the full paths to the .wav files. 
fy .. *cd fcontent /waveglow/ 
<> ye aise es i lsed -i -- ‘s,wavs/,/content/drive/mMyDrive/femalenord/wavs/,g* ‘/content/wave 
& denoiser.py 
 distributed.py # Set train_files location. 
B glow.py lsed -i -- ‘s,"training files": “train _files.txt","training files": “/content 
B&H glow_old.py # Set output directory of waveglow model. 
& inference.py ised -i -- ‘s,"“output_directory": “checkpoints",”output_directory": “/content 


® mel2samp.py 
B® requirements.txt 


[Errno 2] No such file or directory: ‘/content/drive/MyDrive/femalenord’ 
fcontent /waveglow 

kk train.py 4 ls: cannot access ‘wavs/*.wav": No such file or directory 

fcontent /waveolow 


f& train_files txt 








H waveglow_256che eres oe Apr 03 2021 15:30:09 GMT+1100 (Australian Eastern Daylight Time) 
) waveglow_logo.png ~ Run This If Warm Start Training 
Disk 68.25 GB available 
¥ Os completed at 15:30 @ x 


Figure 65 - Opening train.py. 
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Training a WaveGlow model in Google Colab 


Scroll down to line 44 in train.py. Change 


LeSrecitOn = Checkpoinc c1crl|*tlSerarion® | 
to 
iteration = 0O 


Comment out line 45 by adding a ‘#’ in front. 
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BH glow_old.py ue from glow mulled Wa veGlow, 
= ® inference py # Set train files location. 39 from mel2samp import Mel2Sa 
feedi-fe-="s.straining tiles) train files, | p22 
& mel2samp.py 41 def load checkpoint (checkpoi 
B ‘equirements-tt # Set output directory of waveglow model. me assert S28 tiie La Tas 
® train.py lsed -i -- ‘s,“output_directory": “checkpoint a checkpoint_dict = torch 
44 iteration = 4 
train_files.txt : ° 
In [Errno 2] No such file or directory: ‘f/conten 45 #optimizer.load_state_d 
BH waveglow_256channels_|j... fcontent /waveglow 46 model for_loading = chea 
& waveglow_logo.png ls: cannot access ‘wavs/*.wav"': No such file A? model.load state dict(ma 
> Be datalab /content /waveglow 48 print(“Loaded checkpoint 
49 checkpoint_path, i 
> Be dev . a eis) return model, optimizer 
> Be etc _ »v Run This lf Warm Start Training 54 
[E Disk 68.25 GB available 52 def save checkpoint(model, qd 
¥ Os completed at 15:30 @ x 





Figure 66 - train.py modified. 
Press CTRL+S to save the document, then close train.py. 


This will allow us to use the default WaveGlow model we just converted. 
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Training a WaveGlow model in Google Colab 


Scroll down to Start Training. 
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> fe datalab ~ Load Tensorboard Extension 
> fe dev 
eS » Mec i [ ] *%load_ext tensorboard 
Disk 68.24 GB available 
¥Y Os completed at 15:30 @ x 





Figure 67 - Start Training. 
Run the cell below it to begin training a new WaveGlow model. 


Assuming you don’t run into any errors, you should see a new model in the 
‘waveglow_checkpoints’ folder after 200 iterations. Every 200 iterations your progress will be 
saved, overwriting the existing checkpoint. 
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Training a WaveGlow model in Google Colab 


To resume training an existing model, scroll down to Run This If Resuming Training. 


Note: you'll need to make sure you’ve run Setup WaveGlow, Set Model Name, and Set Things Up 
first. 
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B mel2samp.py ~ Start Training 
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Lpython3 -W ignore train.py -c ‘/content/waveglow/config.json* 
& waveglow_256channels_|j-.. ee ee ae 
B® waveglow_logo.png 
> (Be datalab 
>» Be dev 
etc . 
— ’'® ~ Tensorboard 
Disk 63.24 GB available 
Y Os completed at 15:30 @ x 





Figure 68 - Run This If Resuming Training. 


By default this cell is set to resume training the waveglow_femalenord model, so change this to 
match the file name of the model you're training, then click Run Cell. 


Now scroll down to Start Training and click Run Cell to resume training a model. 


Training will continue from the last saved checkpoint. 
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Checking progress in Tensorboard 


CHECKING PROGRESS IN TENSORBOARD 


We can check the progress of our Tacotron and WaveGlow model in Tensorboard. 


Tensorboard will read the log files from the logdir folder that was saved to our checkpoint 
directories. In our example we saved our tacotron checkpoints to ‘tacotron_checkpoints’. For 
WaveGlow, that folder is ‘waveglow_checkpoints’. 


I would recommend testing your models after every 10,000 iterations just to check that things 
are on the right track. See the next section titled Synthesizing audio using the models we’ve 
trained for the steps on how to generate audio to do this. 





Generally speaking. training is considered ‘done’ once we’re satisfied with the output. 
y Sp g, g P 


Scroll down to Tensorboard and run the code cell under Load Tensorboard Extension. 
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Warm Start 
| © *Load_ext tensorboard 
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Setup WaveGlow 
Download Default WaveGlow Model [ ] import tensorflow as tf 
Set Model Name import datetime, os 
Set Things Up 
— Run This If Warm Start Training y Run Tensorboard for Tacotron 
Run This If Resuming Training 
¥Y Os completed at 15:30 @ x 





Figure 69 - Load Tensorboard Extension. 
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Checking progress in Tensorboard 


Next, run the code cell under Import Tensorflow and Datetime. 


Oo x 
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WaveGlow Training 
ne oe [ ] %tensorboard --logdir ‘/content/drive/MyDrive/tacotron2_checkpoints/logdir ' 
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~ Run Tensorboard for WaveGlow 
Set Model Name 
Set Things Up 
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Run This If Warm Start Training gE SEE a a ee) 


Run This If Resuming Training 





¥Y OS completed at 15:30 @ x 
Figure 70 - Import Tensorflow and Datetime. 


Run either the code cell under Run Tensorboard for Tacotron or Run Tensorboard for 
WaveGlow depending on which one you want to check. 


Oo xX 
vi 
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Set Things Up 
Run This If Warm Start Training 
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Figure 71 - Run Tensorboard. 


Tacotron 2 Speech Synthesis Tutorial 51 


Checking progress in Tensorboard 


In the Scalars section, the graphs we need to pay attention to are the training validation loss 


oraphs. 
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Figure 72 - Tensorboard Scalars. 


For Tacotron, once the training and validation loss curves flatten out, you should reduce the 
learning rate. 
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# Reduce batch_size. If you get an “Out Of Memory” error, reduce this further 
lsed -i -- ‘s,batch_size=64,batch_size=35,g" ‘*‘/content/tacotron2/hparams.py' 


# Reduce learning rate 
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Figure 73 - Reducing learning rate. 


You can do this back in the Configure hparams.py section by uncommenting the last time then 
running that code cell again. 
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Checking progress in Tensorboard 


Generally, for datasets based on Skyrim voice acting, I train for 20,000 iterations before lowering 
the learning rate then train for another 5,000 to 10,000 iterations. Beyond that I don’t notice 
much improvement. 


You can check the alignment in the Images section. 
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Figure 74 - Alignment. 


Alignment indicates how well a sound generated by the decoder matches a character read by the 
encoder. 


Good alignment will look more like a horizontal line from the bottom left to the top right. 
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Figure 75 - What good alignment should look like. 
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Synthesizing audio from the models we’ve trained 


SYNTHESIZING AUDIO FROM THE MODELS 
WE’VE TRAINED 


Scroll down to the section labelled Install Tacotron 2. 


Run the code cell under Setup Tacotron 2. This will install and Tacotron 2 and its dependencies 
to your Colab session. 
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Figure 76 - Installing Tacotron 2. 
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Synthesizing audio from the models we’ve trained 


Now scroll down to Generate Audio and run the two code cells under Load Libraries. 
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Figure 77 - Load libraries. 


Run the code cell under Setup hparams. 
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Figure 78 - Setup hparams. 


aha, 
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Synthesizing audio from the models we’ve trained 





Under Load Tacotron2 Model, set ‘checkpoint_path’ to point to your Tacotron model before 
running the code cell. For my femalenord example, that’s going to be 
‘/content/drive/My Drive/tacotron2_checkpoints/tacotron_femalenord’. 
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Figure 79 - Load Tacotron2 Model. 


Now under Load WaveGlow Model, set ‘waveglow_path’ to point to your WaveGlow model 
before running the code cell. For my femalenord example, that’s going to be 
‘/content/drive/My Drive/waveglow_checkpoints/waveglow_femalenord’. 
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Figure 80 - Load WaveGlow Model. 
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Synthesizing audio from the models we’ve trained 


Under Text Input you specify the line of dialogue you want to generate then click run cell. Try 
to keep this down to one or two sentences at most. You can always edit the dialogue together in 
an audio editing program like Audacity later on. 


I marked the end of the line with ‘| ~’. Usually this isn’t necessary but it helps tell Tacotron to 
stop once it reaches that character, just in case your model tends to produce gibberish at the end 


of dialogue. 
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Figure 81 - Text Input. 


Run the code cell under Generate Mel Outputs. 
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Figure 82 - Generate Mel Outputs. 
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Synthesizing audio from the models we’ve trained 


Running the code cell under Synthesize Audio will generate raw unprocessed output. Press the 
play button to listen to the audio. You'll most likely notice some high pitched humming. The 
default sigma value is ‘1’. Lowering this value can help remove some robotic-like sounds but 
note that the further you lower it the more muffled the voice will be. I usually keep this set to 
‘0.85’, but you might need to lower it to ‘0.83’ depending on how many distortions you hear in 
the Voice: 


( ieee CO} 1 > | B > » B BD. > | Bm | 5 il Dp » [ New 








< C @ colab.research.google.com/drive/1 3vRLNPLgVWGjgH GUUUKxXuZwS3PUBBr¥scrollTo =KwSFNDDev_OB 
& Tacotron 
Ej comment Si Share @& A$ 
File Edit View Insert Runtime Tools Help Allchanges saved 
a : RAM > 
+ Code + Text OP anid ¥ Editing A 
|= Table of contents x Disk 7 
Check GPU ; , 
- ~ Synthesize Audio 
Install Tacotron2 _ 
eee *voh en a 
Setup Tacotron2 
<? ia is oO with torch.no_grad(): 
Download Default Tacotron Model audio = waveglow.infer(mel_outputs_postnet, sigma=0.85) # sigma=1 
= ipd.Audiofaudio[S].data.cpu().numpy(}), rate=hparams.sampling rate) 


Generate Audio 
Load Libraries ; 
~ Denoise 
Setup hparams 


Load Tacotron2 Model 
[ ] audio _denoised = denoiser(audio, strength=6.896)[:, 6] # strength=6.1 


Load Waveglow Model ipd.Audio{audio denoised.cpu()}.numpy(), rate=hparams.sampling rate) 
Text Input 


Generate Mel Outputs 


Synthesize Audio 


Denoise 





Figure 83 - Synthesize Audio. 


We'll need to run the Denoise code cell to clean things up. 
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Figure 84 - Denoise. 
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Synthesizing audio from the models we’ve trained 


The higher the streneth of the denoiser the more muffled your audio will sound so try and keep 


this value just low enough to remove the high pitched buzzing. I usually keep the strength set to 
‘0.000’. The default is ‘0.1’ but I think this is a bit excessive. 


You can download the audio as a .wav file by clicking on the three dots and selecting 
Download. 
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Figure 85 - Downloading our audio. 


To generate a new line of dialogue, just run the cells from Text Input onwards. 


Now that we’ve generated some audio samples, it’s time to try and improve audio quality in 


Audacity. 
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Improving audio quality in Audacity 


IMPROVING AUDIO QUALITY IN AUDACITY 


In this section Pll be using Audacity to try and upsample the audio clips generated by Tacotron. 


You can download Audacity for free from its website. 


Audio output from Tacotron will be at 22050Hz by default. While we can output audio at a 
higher sample rate, it likely won’t sound much better. 


Upsampling alone won’t do much to improve the quality of our audio. We need to try and boost 
some of the higher frequencies to add a bit more depth into the voice. 


Important: There’s no good way to do this and there are probably better methods than what P?m about to 
show you. But this is just the method that I use to upsample vocal recordings. 


There are Al-based audio upsampling programs out there, but they tend to introduce more 
distortions so I’m not going to cover that in this tutorial. 


Open the .wav file in Audacity. 
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Figure 86 - Changing the Project Rate. 


Set the Project Rate (Hz) in the bottom-left corner to 44100. 
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Improving audio quality in Audacity 


Press CTRL-+A to select the entire audio track. 


deb this Is Just a test 


File Edit Select View Transport Tracks Generate Effect Analyze Tools Help 


© | Qk 


1 1 1 I 1 
-42— Click to Start Monitoring —-18 








! I 
-3i0 -24 


) -6 
KD[C wt] mim] @/Q/Q,Q/Q) b>)» + 


“ U Digital-In (Creative SB A-Fi) 






























































Mona, 22050Hz 
32-bit float 





Project Rate (Hz) | Snap-To | start and End of Selection 


[2aico | [oF =) SCNOSMISCC OC [SOMOOIS Toso o0hoO moos 


stopped, 











Figure 87 - Entire audio track selected. 


Now go to Tracks > Resample... 


Resample 


New sample rate (Hz): | Seuiale 


Cancel 


Figure 88 - Resampling to 44100 He. 





Make sure the ‘New sample rate (Hz)’ is set to 44100 then click OK. 
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Improving audio quality in Audacity 


Go to Edit > Duplicate to copy the selected audio track. 
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Figure 89 - Audio track duplicated. 


Select the entirety of the duplicate track by double-clicking anywhere within it. 


heb this Is Just a test 


File Edit Select View Transport Tracks Generate Effect Analyze Tools Help 


> KH HW @ A 4) $3 


Qo 


I I I I i] I i 
54 -48_-42— Click to Start Monitoring —-18—-12 


l I l ! l 
-54 -455 -42 -36 -30 -24 -18 12 


© & 3d 
Oolwin) aim eaigig@) pj. _- 


‘wt U Digital-In (Creative 3B X-Fi} ~ oo 2 (Stereo) Recording Chann ~ a} speakers (Creative SB X-Fi) 




















Mono, 22050Hz 
s2-hit float 


Project Rate (He) | Snap-To start and End of Selection we OOhOOmO0s: 
of  » |[00hO0mO0.000s* [00h00m01.2306" OOHO0OMO0S 


stopped, 





Figure 90 - Duplicate track selected. 
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Improving audio quality in Audacity 


Go to Effect > High Pass Filter... 


High-Pass Filter 


Frequency (Hz): SOOO), 0 


Roll-off (dB per octave): 36 dB 
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Figure 91 - High Pass Filter. 


Set “Frequency (Hz)’ to 8000.0 and ‘Roll-off (dB per octave)’ to 36 dB then click OK. This will 
isolate just the higher frequencies. 
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Figure 92 - High Pass Filter applied to duplicate track. 
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Improving audio quality in Audacity 
We need to raise the pitch of these higher frequencies a bit. With the duplicate track still 
selected, go to Effect > Change Pitch... 
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Figure 93 - Change Pitch. 


Set “Semitones (half-steps)’ to 1.90. Ensure ‘Use high quality stretching (slow)’ is ticked then click 
OK. 


Tacotron 2 Speech Synthesis Tutorial 64 


Improving audio quality in Audacity 


The last thing we need to do 1s play around with the gain of the duplicate track. For male voices 
I usually set this to around +3 dB depending on the voice, just to add a little bit more depth. For 
female voices I set this to -3 dB to lessen high pitched hissing. 


You'll need to play this by ear and find out what sounds best yourself. 
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Figure 94 - Gain controls. 
To save a new .wav file, go to File > Export > Export as WAV. 


Select the folder you want to save the .wav file to then click Save. 
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A couple tips and tricks 


| A COUPLE TIPS AND TRICKS 


TIP 1: Sometimes part of the last word will get cut off, so what I like to do is add another word 
after it, like “end’, to help mitigate that. 
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Figure 95 - Ending word. 
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A couple tips and tricks 


TIP 2: The first few words seem to have a large impact on the delivery of the rest of the 
dialogue. 


‘this is just a test’ may sound a lot different to “elf scum! this is just a test’. If you copy part of an 

existing line of dialogue from your dataset, like ‘never should’ve come here!’ and prefix it before 

the sentence you actually want to generate, some of the emotional intensity from the original line 
of dialogue will transfer across to the new line of dialogue. 
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Figure 96 - Starting words. 


If your dialogue is being screamed or shouted and you don’t want it to be, try prefixing 
something benign like ‘hello there’ and that should lower the emotional intensity of the rest of 
the dialogue. 


Capitalization doesn’t really make a difference to the output. 


TIP 3: Every time you generate dialogue it'll sound different. If you don’t like the way a specific 
sentence was conveyed, just run the code cells from Generate Mel Outputs onwards and check 
again. 





If I hear some robotic-like noises or distortions, I just generate the dialogue over and over until I 
get a clean read. 


TIP 4: Not all words will be pronounced properly. The malenordcommander model I trained 
has issues with words like “dialogue’. In those cases I’d have to type in ‘die a log’ instead. For 
making malenordcommander pronounce ‘python’ properly (and not as ‘pee-thon’) Pd have to 
type it in as ‘piiethon’. 


Ivll require a bit of trial and error to figure out what works in those cases. 
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